Skip to content

chore(recipe): bump dynamo-platform from 0.9.x to 1.0.1#459

Open
Jont828 wants to merge 4 commits intoNVIDIA:mainfrom
Jont828:chore/bump-dynamo-platform-1.0.1
Open

chore(recipe): bump dynamo-platform from 0.9.x to 1.0.1#459
Jont828 wants to merge 4 commits intoNVIDIA:mainfrom
Jont828:chore/bump-dynamo-platform-1.0.1

Conversation

@Jont828
Copy link
Copy Markdown
Contributor

@Jont828 Jont828 commented Mar 24, 2026

Summary

  • Upgrade dynamo-platform from 0.9.x to the latest 1.0.1 release across the component registry and all 5 Dynamo inference overlay recipes
  • Rewrite recipes/components/dynamo-platform/values.yaml for the 1.0 Helm schema: global.* subchart controls, upgradeCRD: true, removed stale image pin and kube-rbac-proxy workaround (fixed upstream)
  • dynamo-crds version intentionally unchanged (no 1.0 CRD chart exists; platform chart now bundles CRDs via upgradeCRD)

Test plan

  • go test -race ./pkg/recipe/... -count=1 — passes
  • go test -race ./pkg/bundler/... -count=1 — passes
  • make test — all tests pass, coverage 72%+
  • make lint (golangci-lint + yamllint on changed files) — clean
  • KWOK e2e with a Dynamo overlay (make kwok-e2e RECIPE=h100-eks-ubuntu-inference-dynamo)
  • Deploy to AKS/EKS cluster and verify dynamo-platform 1.0.1 chart renders correctly with global.* keys

🤖 Generated with Claude Code

Upgrade Dynamo platform to the latest 1.0.1 release across registry
and all inference overlay recipes. Key changes for the 1.0 schema:

- Registry: defaultVersion 0.9.1 → 1.0.1
- Overlays: all 5 dynamo overlays updated from 0.9.0 → 1.0.1
- Values: rewritten for 1.0 Helm schema (global.* subchart controls,
  upgradeCRD: true, removed stale image pins and kube-rbac-proxy
  workaround fixed upstream)

Signed-off-by: Jont828 <jt572@cornell.edu>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 24, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@dims
Copy link
Copy Markdown
Collaborator

dims commented Mar 31, 2026

/ok to test 852739e

@ayuskauskas ayuskauskas requested a review from a team as a code owner March 31, 2026 17:30
@Jont828
Copy link
Copy Markdown
Contributor Author

Jont828 commented Mar 31, 2026

@dims Merged changes from main, can we get another CI run?

@dims
Copy link
Copy Markdown
Collaborator

dims commented Mar 31, 2026

/ok to test 852739e
/ok to test 7330295
/ok to test 6793278

@dims
Copy link
Copy Markdown
Collaborator

dims commented Apr 2, 2026

/ok to test 852739e
/ok to test 7330295
/ok to test 6793278
/ok to test 652031e

Copy link
Copy Markdown
Contributor

@yuanchen8911 yuanchen8911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Clean, well-scoped change (+31/-34, 7 files). The 0.9.x → 1.0.1 migration and Helm schema rewrite look correct overall, with one significant issue:

High: Grove dropped from deployment without external replacement

The old values had grove: enabled: true at the top level, deploying grove as a subchart of dynamo-platform. The new values set global.grove.install: false + global.grove.enabled: true, which tells the Dynamo operator "grove is installed externally" — but AICR has no standalone grove component:

  • No grove entry exists in recipes/registry.yaml (only nodeScheduling paths referencing the subchart at L376/379)
  • No Dynamo overlay lists grove as a dependencyRef

This effectively removes grove from the bundle, which is a behavioral regression for multinode/PodClique-based Dynamo workloads. The upstream 1.0.1 docs confirm global.grove.enabled: true only enables integration with an already-installed grove — it does not deploy it.

Fix options:

  1. Add a standalone grove component to registry.yaml and the Dynamo overlay dependencyRefs, or
  2. Set global.grove.install: true to keep deploying grove as a subchart

Minor observations

  • Registry/overlay versions now aligned — registry was 0.9.1 while overlays pinned 0.9.0; both are 1.0.1 now. Good.
  • dynamo-crds + upgradeCRD: true — overlays still have dependencyRefs: [dynamo-crds]. Confirm the separate CRD chart and platform chart's upgradeCRD don't cause CRD ownership conflicts.
  • KWOK e2e and cluster deploy unchecked — given this is a schema rewrite, recommend completing before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants